Mining Protein Contact Maps
نویسندگان
چکیده
The 3D conformation of a protein may be compactly represented in a symmetrical, square, boolean matrix of pairwise, inter-residue contacts, or “contact map”. The contact map provides a host of useful information about the protein’s structure. In this paper we describe how data mining can be used to extract valuable information from contact maps. For example, clusters of contacts represent certain secondary structures, and also capture non-local interactions, giving clues to the tertiary structure. In this paper we focus on two main tasks: 1) Given the database of protein sequences, discover an extensive set of non-local (frequent) dense patterns in their contact maps, and compile a library of such non-local interactions. 2) Cluster these patterns based on their similarities and evaluate the clustering quality. We show via experiments that our techniques are effective in characterizing contact patterns across different proteins, and can be used to improve contact map prediction for unknown proteins as well as to learn protein folding pathways.
منابع مشابه
Mining of protein contact maps for protein fold prediction
Contact maps have been used in ab initio methods for the problem of protein structure prediction problem. Secondary structures and contacts made by the residues are clearly visible in the contact maps where helices are seen as thick bands and the beta sheets are seen as orthogonal to the diagonal. This paper explores the idea of extracting rules from contact maps to represent “protein fold” inf...
متن کاملLGM: Mining Frequent Subgraphs from Linear Graphs
A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient enumeration of frequent subgraphs. Based o...
متن کاملMining Dense Patterns from Off Diagonal Protein Contact Maps
The three dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Protein contact maps are 2D representations of contacts among the amino acid residues in the folded protein structure. Proteins are biochemical compounds consisting of one or more polypeptides, facilitating a biological function. Many researchers make note of the way secondary...
متن کاملSelecting protein fuzzy contact maps through information and structure measures
Protein contact maps are representations of the proteins three dimensional folding topology. A fuzzy generalization of contact maps (FGCM) provides to the researcher flexibility not present in standard (i.e. crisp) contact maps but it also changes the information content of the data. To aid in the rationale –rather than ad-hocselection of generalized contact maps parameters we introduce some un...
متن کاملGPCRRD: G protein-coupled receptor spatial restraint database for 3D structure modeling and function annotation
SUMMARY G protein-coupled receptors (GPCRs) comprise the largest family of integral membrane proteins. They are the most important class of drug targets. While there exist crystal structures for only a very few GPCR sequences, numerous experiments have been performed on GPCRs to identify the critical residues and motifs. GPCRRD database is designed to systematically collect all experimental res...
متن کامل